Vectors

At the end of this tutorial, you will be know :

Introduction

Typically, there are three types of vectors: addrefslide

  • Points
  • Lines
  • Polygons

The most common formats are: addrefslide

  • ESRI shapefile (.shp) : the traditional and most frequently used format
  • Geopackages (.gpkg) : the new standard
  • GPS Exchange Format (.gpx)
  • Google Earth file (.kml / .kmz)

There are two main packages in R to work with spatial vectors: terra and sf (that replace older packages such as raster and sp). This tutorial will mostly use the package terra for smoother integration with the rest of the tutorial (especially working with rasters), but most functions have their equivalent in sf.

At the start of the tutorial, equivalent functions in sf will be shown.

Let’s start with loading the required packages.

library(terra) |> suppressPackageStartupMessages()
library(mapview)
library(here) |> suppressPackageStartupMessages()
library(sf) |> suppressPackageStartupMessages()

Points

Creating spatial points from data

This is the most common case: you have coordinates from a (csv/excel/text) file and you want to get spatial information about these locations. The first step is to transform the coordinates as proper spatial object in R.

We will use a dataset from GBIF(addref) with all occurrences of otter recorded in 2021 within a 50km buffer from Montpellier, France. This data was prepared and stored on Github.

otter <- read.csv(here("data", "gbif_otter_2021_mpl50km.csv"))
otter <- read.csv(
  "https://github.com/FRBCesab/spatial-r/raw/main/data/gbif_otter_2021_mpl50km.csv"
)
dim(otter)
[1] 83 15
names(otter)
 [1] "key"                              "institutionCode"                 
 [3] "species"                          "occurrenceStatus"                
 [5] "eventDate"                        "year"                            
 [7] "month"                            "day"                             
 [9] "decimalLongitude"                 "decimalLatitude"                 
[11] "elevation"                        "identificationVerificationStatus"
[13] "identifier"                       "datasetKey"                      
[15] "provider"                        
table(otter$institutionCode)

 iNaturalist UAR PatriNat 
           7           76 

The dataset contains 83 observations of otter (in rows), and 15 variables (in column) such as date, coordinates, and data provider. The geographic coordinates are stored in the columns decimalLongitude, and decimalLatitude. They are expressed in decimal degrees, with the standard datum WGS84 EPSG:4326.

Let’s create a spatial object from the coordinates.

In terra, the key function is terra::vect().

pt_terra <- vect(
  otter,
  geom = c("decimalLongitude", "decimalLatitude"),
  crs = "EPSG:4326"
)
# summary
pt_terra
 class       : SpatVector 
 geometry    : points 
 dimensions  : 83, 13  (geometries, attributes)
 extent      : 3.28983, 4.46447, 43.30373, 44.06548  (xmin, xmax, ymin, ymax)
 coord. ref. : lon/lat WGS 84 (EPSG:4326) 
 names       :       key institutionCode     species occurrenceStatus
 type        :     <num>           <chr>       <chr>            <chr>
 values      : 3.059e+09     iNaturalist Lutra lutra          PRESENT
               3.854e+09    UAR PatriNat Lutra lutra          PRESENT
               3.854e+09    UAR PatriNat Lutra lutra          PRESENT
       eventDate  year month   day elevation identificationVerificationStatus
           <chr> <int> <int> <int>     <int>                            <chr>
 2021-01-24T15:~  2021     1    24        NA                               NA
      2021-01-13  2021     1    13        NA                         Probable
      2021-01-08  2021     1     8        NA                         Probable
 (and 3 more)
             
             
             
             
# visualization
plot(pt_terra)

In sf, the key function is sf::st_as_sf().

pt_sf <- st_as_sf(
  otter,
  coords = c("decimalLongitude", "decimalLatitude"),
  crs = "EPSG:4326"
)
# summary
pt_sf
Simple feature collection with 83 features and 13 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 3.28983 ymin: 43.30373 xmax: 4.46447 ymax: 44.06548
Geodetic CRS:  WGS 84
First 10 features:
          key institutionCode     species occurrenceStatus           eventDate
1  3058935748     iNaturalist Lutra lutra          PRESENT 2021-01-24T15:26:55
2  3853886594    UAR PatriNat Lutra lutra          PRESENT          2021-01-13
3  3853886619    UAR PatriNat Lutra lutra          PRESENT          2021-01-08
4  3853886780    UAR PatriNat Lutra lutra          PRESENT          2021-01-12
5  4546764953    UAR PatriNat Lutra lutra          PRESENT          2021-01-31
6  4546968119    UAR PatriNat Lutra lutra          PRESENT          2021-01-17
7  4547054813    UAR PatriNat Lutra lutra          PRESENT          2021-01-31
8  4548321676    UAR PatriNat Lutra lutra          PRESENT          2021-01-23
9  4548484387    UAR PatriNat Lutra lutra          PRESENT          2021-01-31
10 4552966259    UAR PatriNat Lutra lutra          PRESENT          2021-01-06
   year month day elevation identificationVerificationStatus
1  2021     1  24        NA                             <NA>
2  2021     1  13        NA                         Probable
3  2021     1   8        NA                         Probable
4  2021     1  12        NA                         Probable
5  2021     1  31        NA                         Probable
6  2021     1  17        NA                         Probable
7  2021     1  31        NA                         Probable
8  2021     1  23        NA                         Probable
9  2021     1  31        NA                         Probable
10 2021     1   6        NA                         Probable
                             identifier                           datasetKey
1                              68608477 50c9509d-22c7-4a22-a47d-8c48425ef4a7
2  5b4c3803-50a9-465a-a1a7-35c4a4d7d508 c32f3129-a4dc-4e36-86d4-a35cc5cfb04d
3  cc533347-f1e3-4cf3-9f55-2c7cf60500de c32f3129-a4dc-4e36-86d4-a35cc5cfb04d
4  3210c6f2-391a-4cc4-b497-b915bbc9d7d4 c32f3129-a4dc-4e36-86d4-a35cc5cfb04d
5  631348e0-646f-41eb-b3c7-01900054c689 256b9877-cef3-4e8e-84e1-f23299c49655
6  754868ad-58ef-41eb-b3c7-01900054452d 256b9877-cef3-4e8e-84e1-f23299c49655
7  771217f4-6470-41eb-b3c7-01900054c695 256b9877-cef3-4e8e-84e1-f23299c49655
8  b00a76e4-907d-41eb-adf1-041006068357 256b9877-cef3-4e8e-84e1-f23299c49655
9  b2343db5-646f-41eb-b3c7-01900054c693 256b9877-cef3-4e8e-84e1-f23299c49655
10 33310104-5041-41eb-bfb5-01900053f23b 256b9877-cef3-4e8e-84e1-f23299c49655
                                                                                 provider
1                                                 iNaturalist Research-grade Observations
2  LIGNE NOUVELLE MONTPELLIER PERPIGNAN DUP PHASE 1 - Inventaire non protocolé Mammifères
3  LIGNE NOUVELLE MONTPELLIER PERPIGNAN DUP PHASE 1 - Inventaire non protocolé Mammifères
4  LIGNE NOUVELLE MONTPELLIER PERPIGNAN DUP PHASE 1 - Inventaire non protocolé Mammifères
5                                          UAR PatriNat - opportunistic data Faune-France
6                                          UAR PatriNat - opportunistic data Faune-France
7                                          UAR PatriNat - opportunistic data Faune-France
8                                          UAR PatriNat - opportunistic data Faune-France
9                                          UAR PatriNat - opportunistic data Faune-France
10                                         UAR PatriNat - opportunistic data Faune-France
                    geometry
1  POINT (3.792841 43.81636)
2   POINT (3.46399 43.38562)
3    POINT (3.28983 43.3172)
4   POINT (3.83235 43.57202)
5   POINT (3.46399 43.38562)
6   POINT (3.92099 44.06548)
7   POINT (3.85823 43.52581)
8   POINT (3.57398 43.63949)
9    POINT (3.28983 43.3172)
10  POINT (3.54517 43.75695)
# visualization
plot(pt_sf, max.plot = 1, axes = TRUE)

Handling spatial vectors

Let’s have a look at the most common functions to handle spatial vectors in R.

# get the geographical extent
ext(pt_terra)
SpatExtent : 3.28983, 4.46447, 43.30373, 44.06548 (xmin, xmax, ymin, ymax)
# get number of objects, number of attributes
dim(pt_terra)
[1] 83 13
# get the projection system
crs(pt_terra, describe = TRUE)
    name authority code  area             extent
1 WGS 84      EPSG 4326 World -180, 180, -90, 90
# get the coordinates
crds(pt_terra) |> head()
            x        y
[1,] 3.792841 43.81636
[2,] 3.463990 43.38562
[3,] 3.289830 43.31720
[4,] 3.832350 43.57202
[5,] 3.463990 43.38562
[6,] 3.920990 44.06548
# manipulate the dataset
# the attached data is formatted as a standard data.frame
# you can add a new column with '$' (for instance month labels)
pt_terra$lab_month <- month.name[pt_terra$month]

# filter rows with '[]' (for instance, keep only iNaturalist observations)
iNat_terra <- pt_terra[pt_terra$institutionCode == "iNaturalist", ]
dim(iNat_terra)
[1]  7 14
# get the geographical extent
st_bbox(pt_sf)
    xmin     ymin     xmax     ymax 
 3.28983 43.30373  4.46447 44.06548 
# get number of objects, number of attributes
dim(pt_sf)
[1] 83 14
# get the projection system
crs(pt_sf, describe = TRUE)
    name authority code  area             extent
1 WGS 84      EPSG 4326 World -180, 180, -90, 90
# get the coordinates
st_coordinates(pt_sf) |> head()
            X        Y
[1,] 3.792841 43.81636
[2,] 3.463990 43.38562
[3,] 3.289830 43.31720
[4,] 3.832350 43.57202
[5,] 3.463990 43.38562
[6,] 3.920990 44.06548
# manipulate the dataset
# the attached data is formatted as a standard data.frame
# you can add a new column with '$' (for instance month labels)
pt_sf$lab_month <- month.name[pt_sf$month]

# filter rows with '[]' (for instance, keep only iNaturalist observations)
iNat_sf <- pt_sf[pt_sf$institutionCode == "iNaturalist", ]
dim(iNat_sf)
[1]  7 15

Projection

There are different coordinate reference systems in which coordinates can be expressed (e.g. different unit). While manipulating multiple datasets from different sources, it is important to make sure they are all in the same coordinate system. If it is not the case, then you will need make projection to convert the projection system. addrefslide

For example, in France, the preferred projection system by IGN is Lambert 93 (EPSG:2154).As an exercise, let’s project our points to EPSG:2154.

In terra, the key function is terra::project().

# projection
pt_2154_terra <- project(pt_terra, "EPSG:2154")
# see the differences in the bounding box
# extent in lat/long
ext(pt_terra)
SpatExtent : 3.28983, 4.46447, 43.30373, 44.06548 (xmin, xmax, ymin, ymax)
# new extent in Lambert-93
ext(pt_2154_terra)
SpatExtent : 723524.638618867, 818474.392571545, 6245000.52485334, 6330038.9258292 (xmin, xmax, ymin, ymax)

In sf, the key function is sf::st_transform().

#projection
pt_2154_sf <- st_transform(pt_sf, crs = "EPSG:2154")
# see the differences in the bounding box
# extent in lat/long
st_bbox(pt_sf)
    xmin     ymin     xmax     ymax 
 3.28983 43.30373  4.46447 44.06548 
# new extent in Lambert-93
st_bbox(pt_2154_sf)
     xmin      ymin      xmax      ymax 
 723524.6 6245000.5  818474.4 6330038.9 

Export the data

For vectors, it is recommended to export them as geopackage file (extension .gpkg). Compare to traditional ESRI shapefile, the geopackage format store the data in a single file and the column names are preserved. (addref slide file format).

In terra, the function is terra::writeVector().

writeVector(pt_terra, here("data", "gbif_otter_2021_mpl50km.gpkg"))

In sf, the function is sf::write_sf() or sf::st_write. (why two options?)

write_sf(pt_sf, here("data", "gbif_otter_2021_mpl50km.gpkg"))

Conversion between sf and terra

It is confusing to have two dominant packages terra and sf having similar functionalities. Luckily, the conversion between these two format is easy.

To convert a terra vector into sf, the function is sf::st_as_sf().

pt_sf <- st_as_sf(pt_terra)

To convert a sf feature into terra, the function is terra::vect(). Be careful that non-homogeneous features (mix of points, lines and polygons) can’t be converted as a terra SpatVector object. Only homogeneous points, line or polygons can be converted into terra geometry.

pt_terra <- vect(pt_sf)

Lines

Lines are another type of vectors, made of multiple points. It is possible to create lines but in practice, it often comes from an existing spatial dataset. In our example case study, we will load the rivers for the area of interest from BD CARTO (addref).

Load lines from a geopackage file

river <- vect(here("data", "BDCARTO-River_mpl50km.gpkg"))
river <- vect(
  "https://github.com/FRBCesab/spatial-r/raw/refs/heads/main/data/BDCARTO-River_mpl50km.gpkg"
)

Handling spatial vectors

# get the number of objects, and the number of attributes
dim(river)
[1] 1500    8
# get the projection system
crs(river, describe = TRUE)
    name authority code  area             extent
1 WGS 84      EPSG 4326 World -180, 180, -90, 90
# see a subset of the data
head(river, 3)
              id                   cleabs code_hydrographique statut
1 cours_d_eau.46 COURDEAU0000002276554466 06C0000002276554466 Validé
2 cours_d_eau.52 COURDEAU0000002215481117 06C0000002215481117 Validé
3 cours_d_eau.59 COURDEAU0000002000818257 06C0000002000818257 Validé
                 toponyme statut_du_toponyme influence_de_la_maree
1              la Dourbie             Validé                  <NA>
2 Ruisseau de Montmorency             Validé                  <NA>
3      Ruisseau de Lussac             Validé                  <NA>
  caractere_permanent
1                <NA>
2                <NA>
3                <NA>

Calculate length of lines

The function terra::perim() calculate the length of lines (expressed in meter). While calculating distance and length, be careful with projection systems. Some are not suited to calculate distance. Prefer equidistant projections or use local projection system (if your study area is small). The package terra recommends the calculation of distances in lat/long to get more accurate results (considering the geodesic distance, so accounting for Earth’s curvature). (addrefslide)

# calculate the length of rivers
river$length_km <- perim(river) / 1000

# see the distribution of river length
boxplot(river$length_km, ylab = "length (km)")

# get the name of the longest river
river$toponyme[which.max(river$length_km)]
[1] "l'Hérault"

Visualization

mapview(river, z = "length_km") +
  mapview(pt_terra, col.regions = "red", color = NA)
plot(river, y = "length_km", type = "continuous")
plot(pt_terra, col = "red", add = TRUE)

Polygons

Load polygons from a shapefile

landuse <- vect(here("data", "BDCARTO-LULC_mpl50km.shp"))
landuse <- vect(
  "https://github.com/FRBCesab/spatial-r/raw/refs/heads/main/data/BDCARTO-LULC_mpl50km.shp"
)

Handling spatial vectors

# get the number of objects, and the number of attributes
dim(landuse)
[1] 2384    3
# get the projection system
crs(landuse, describe = TRUE)
    name authority code area         extent
1 WGS 84      EPSG 4326 <NA> NA, NA, NA, NA
# see a subset of the data
head(landuse, 3)
                    id                   cleabs       nature
1 occupation_du_sol.32 BDC_OCS_0000002201861697 Broussailles
2 occupation_du_sol.68 BDC_OCS_0000002201863362         Bâti
3 occupation_du_sol.98 BDC_OCS_0000002201860189    Eau libre
# see the distribution of land cover classes (number of polygons)
table(landuse$nature)

              Bâti       Broussailles Carrière, décharge          Eau libre 
               360                235                 35                 49 
             Forêt      Marais salant  Marais, tourbière            Prairie 
               455                  8                 54                707 
   Rocher, éboulis     Sable, gravier      Vigne, verger   Zone d'activités 
                 1                 18                316                146 

Visualization

mapview(landuse, z = "nature")
plot(landuse, y = "nature", type = "classes")

Calculate area

The function expanse calculate the area in \(m^2\). Again, be careful with projection systems (addrefslide). Some are not suited to calculate areas. Prefer equalarea projections or use local projection system (if your study area is small). The package terra recommends the calculation of areas in lat/long to get more accurate results (accounting for Earth’s curvature).

# calculate area of polygons
landuse$area_km2 <- expanse(landuse) * 0.000001

# see area per land use classes
tapply(landuse$area_km2, landuse$nature, sum)
              Bâti       Broussailles Carrière, décharge          Eau libre 
       474.8521879        891.4256281         16.2444372       2799.1424086 
             Forêt      Marais salant  Marais, tourbière            Prairie 
      1804.7059831         33.8696654        236.7411376       1607.4753153 
   Rocher, éboulis     Sable, gravier      Vigne, verger   Zone d'activités 
         0.3778454         18.7532616       1752.9425395        137.2554131